Machine Translation - Corpus Linguistics

نویسنده

  • Joel Hinz
چکیده

Due to the importance of communication between two or more people, companies, or even nations, the need for good translators has long been obvious around the world, and since humans are faulty, it is not surprising that many attempts to automate the process of translation have been made throughout history. This is called machine translation. Corpus linguistics, a branch of the machine translation tree, utilises statistical methods to analyse text samples – corpora – and makes conclusions based on the results. The goal of this paper is to identify and account for the problems and possibilities associated with this kind of statistical approach to translation, as well as give a brief view of the history of the topic and glimpse at projects currently in research.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Principles of corpus linguistics and their application to translation studies research

Corpora have been put to many different uses in fields as varied as natural language processing, critical discourse analysis and applied linguistics, to mention just a few. As is to be expected, within each of those areas corpora fulfil different roles, from providing data to build statistical machine translation systems to revealing ideological stance in politicallysensitive texts. ‘Corpus lin...

متن کامل

Building Parallel Corpora for SMT System: A Case Study of English-Manipuri

The Statistical Machine Translation (SMT) systems are developed using sentence aligned parallel corpus. The difficulty is that there is no parallel corpus at the required measure for many language pairs. The preparation of large scale parallel corpus takes time and demands the linguistics skill. In the present work, the various issues of a quality parallel corpus and a technique that extracts p...

متن کامل

EVBCorpus - A Multi-Layer English-Vietnamese Bilingual Corpus for Studying Tasks in Comparative Linguistics

Bilingual corpora play an important role as resources not only for machine translation research and development but also for studying tasks in comparative linguistics. Manual annotation of word alignments is of significance to provide a gold-standard for developing and evaluating machine translation models and comparative linguistics tasks. This paper presents research on building an English-Vi...

متن کامل

CzEng: Czech-English Parallel Corpus release version 0.5

We introduce CzEng 0.5, a new Czech-English sentence-aligned parallel corpus consisting of around 20 million tokens in either language. The corpus is available on the Internet and can be used under the terms of license agreement for non-commercial educational and research purposes. Besides the description of the corpus, also preliminary results concerning statistical machine translation experim...

متن کامل

Corpus based coreference resolution for Farsi text

"Coreference resolution" or "finding all expressions that refer to the same entity" in a text, is one of the important requirements in natural language processing. Two words are coreference when both refer to a single entity in the text or the real world. So the main task of coreference resolution systems is to identify terms that refer to a unique entity. A coreference resolution tool could be...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005